57 research outputs found
Perceptual Grouping for Contour Extraction
This paper describes an algorithm that efficiently groups line segments into perceptually salient contours in complex images. A measure of affinity between pairs of lines is used to guide group formation and limit the branching factor of the contour search procedure. The extracted contours are ranked, and presented as a contour hierarchy. Our algorithm is able to extract salient contours in the presence of texture, clutter, and repetitive or ambiguous image structure. We show experimental results on a complex line-set. 1
Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients
We design a novel algorithm for optimal transport by drawing from the
entropic optimal transport, mirror descent and conjugate gradients literatures.
Our scalable and GPU parallelizable algorithm is able to compute the
Wasserstein distance with extreme precision, reaching relative error rates of
without numerical stability issues. Empirically, the algorithm
converges to high precision solutions more quickly in terms of wall-clock time
than a variety of algorithms including log-domain stabilized Sinkhorn's
Algorithm. We provide careful ablations with respect to algorithm and problem
parameters, and present benchmarking over upsampled MNIST images, comparing to
various recent algorithms over high-dimensional problems. The results suggest
that our algorithm can be a useful addition to the practitioner's optimal
transport toolkit
Measuring Symmetry in Real-World Scenes Using Derivatives of the Medial Axis Radius Function
Symmetry has been shown to be an important principle that guides the grouping of scene information. Previously, we have described a method for measuring the local, ribbon symmetry content of line-drawings of real-world scenes (Rezanejad, et al., MODVIS 2017), and we demonstrated that this information has important behavioral consequences (Wilder, et al., MODIVS 2017). Here, we describe a continuous, local version of the symmetry measure, that allows for both ribbon and taper symmetry to be captured. Our original method looked at the difference in the radius between successive maximal discs along a symmetric axis. The number of radii differences in a local region that exceeded a threshold, normalized by the number of total differences, was used as the symmetry score at an axis point. We now use the derivative of the radius function along the symmetric axis between two contours, which allows for a continuous method of estimating the score which does not need a threshold. By replacing the first derivative with a second derivative, we can generalize this method to allow pairs of contours which taper with respect to one another, to express high symmetry. Such situations arise, for example, when parallel lines in the 3D world project onto a 2D image. This generalization will allow us to determine the relative importance of taper and ribbon symmetries in natural scenes
StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
Instructional videos are an important resource to learn procedural tasks from
human demonstrations. However, the instruction steps in such videos are
typically short and sparse, with most of the video being irrelevant to the
procedure. This motivates the need to temporally localize the instruction steps
in such videos, i.e. the task called key-step localization. Traditional methods
for key-step localization require video-level human annotations and thus do not
scale to large datasets. In this work, we tackle the problem with no human
supervision and introduce StepFormer, a self-supervised model that discovers
and localizes instruction steps in a video. StepFormer is a transformer decoder
that attends to the video with learnable queries, and produces a sequence of
slots capturing the key-steps in the video. We train our system on a large
dataset of instructional videos, using their automatically-generated subtitles
as the only source of supervision. In particular, we supervise our system with
a sequence of text narrations using an order-aware loss function that filters
out irrelevant phrases. We show that our model outperforms all previous
unsupervised and weakly-supervised approaches on step detection and
localization by a large margin on three challenging benchmarks. Moreover, our
model demonstrates an emergent property to solve zero-shot multi-step
localization and outperforms all relevant baselines at this task.Comment: CVPR'2
- …